Estimating depth from a single RGB image is an ill-posed and inherentlyambiguous problem. State-of-the-art deep learning methods can now estimateaccurate 2D depth maps, but when the maps are projected into 3D, they lacklocal detail and are often highly distorted. We propose a fast-to-traintwo-streamed CNN that predicts depth and depth gradients, which are then fusedtogether into an accurate and detailed depth map. We also define a novel setloss over multiple images; by regularizing the estimation between a common setof images, the network is less prone to over-fitting and achieves betteraccuracy than competing methods. Experiments on the NYU Depth v2 dataset showsthat our depth predictions are competitive with state-of-the-art and lead tofaithful 3D projections.
展开▼